{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 实时语音对话能力\n",
    "**注意⚠️：实时语音功能目前处于内测阶段，使用过程中有任何问题，欢迎提issue或微信群反馈～**\n",
    "\n",
    "## 目标\n",
    "实现一个实时语音对话功能，支持多种语音音色。用户可以参考cookbook代码，通过AppBuilder-SDK将实时语音功能很好地融入自己的平台、应用中。\n",
    "\n",
    "## 实现原理\n",
    "通过循环不断处理用户的语音，将语音转文本，然后进行对话，最后将对话结果通过TTS进行播报。。\n",
    "* 使用大模型的 ASR 进行语音转文本。\n",
    "* 使用用户自己创建的Agent进行对话，适配用户的应用场景，并具有上下文理解能力。\n",
    "* 使用大模型的 TTS 进行文本转语音并进行播报。\n",
    "\n",
    "## 前置条件\n",
    "* 使用内置ASR、TTS组件之前，请先开通组件服务并够买额度，可参考[开通组件服务](https://cloud.baidu.com/doc/AppBuilder/s/Glqb6dfiz#3%E3%80%81%E5%BC%80%E9%80%9A%E7%BB%84%E4%BB%B6%E6%9C%8D%E5%8A%A1)\n",
    "* pip安装pyaudio、webrtcvad依赖包\n",
    "* 给程序开放麦克风权限\n",
    "* 创建好自己的Agent应用\n",
    "\n",
    "## 示例代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright (c) 2024 Baidu, Inc. All Rights Reserved.\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License.\n",
    "\n",
    "import os\n",
    "import time\n",
    "import wave\n",
    "import sys\n",
    "import pyaudio\n",
    "import webrtcvad\n",
    "import appbuilder\n",
    "import re\n",
    "\n",
    "# 请前往千帆AppBuilder官网创建密钥，流程详见：https://cloud.baidu.com/doc/AppBuilder/s/Olq6grrt6#1%E3%80%81%E5%88%9B%E5%BB%BA%E5%AF%86%E9%92%A5\n",
    "# 设置环境变量\n",
    "os.environ[\"APPBUILDER_TOKEN\"] = (\n",
    "    \"...\"\n",
    ")\n",
    "# 已发布AppBuilder应用的ID\n",
    "app_id = \"...\"\n",
    "appbuilder.logger.setLoglevel(\"ERROR\")\n",
    "\n",
    "CHUNK = 1024\n",
    "FORMAT = pyaudio.paInt16\n",
    "CHANNELS = 1 if sys.platform == \"darwin\" else 2\n",
    "RATE = 16000\n",
    "DURATION = 30  # ms\n",
    "CHUNK = RATE // 1000 * DURATION\n",
    "\n",
    "\n",
    "class Chatbot:\n",
    "    def __init__(self):\n",
    "        self.p = pyaudio.PyAudio()\n",
    "        self.tts = appbuilder.TTS()\n",
    "        self.asr = appbuilder.ASR()\n",
    "        self.agent = appbuilder.AppBuilderClient(app_id)\n",
    "        self.conversation_id = self.agent.create_conversation()\n",
    "\n",
    "    def run(self):\n",
    "        self.run_tts_and_play_audio(\n",
    "            \"我是你的专属聊天机器人，如果你有什么问题，可以直接问我\"\n",
    "        )\n",
    "        while True:\n",
    "            # Record\n",
    "            audio_path = \"output.wav\"\n",
    "            print(\"开始记录音频...\")\n",
    "            if self.record_audio(audio_path) < 1000:\n",
    "                time.sleep(1)\n",
    "                continue\n",
    "            print(\"音频记录结束\")\n",
    "\n",
    "            # ASR\n",
    "            print(\"开始执行ASR...\")\n",
    "            query = self.run_asr(audio_path)\n",
    "            print(\"结束执行ASR\")\n",
    "\n",
    "            # Agent\n",
    "            print(\"query: \", query)\n",
    "            if len(query) == 0:\n",
    "                continue\n",
    "            answer = self.run_agent(query)\n",
    "            results = re.findall(r\"(https?://[^\\s]+)\", answer)\n",
    "            for result in results:\n",
    "                print(\"链接地址:\", result)\n",
    "                answer = answer.replace(result, \"\")\n",
    "            print(\"answer:\", answer)\n",
    "\n",
    "            # TTS\n",
    "            print(\"开始执行TTS并播报...\")\n",
    "            self.run_tts_and_play_audio(answer)\n",
    "            print(\"结束TTS并播报结束\")\n",
    "\n",
    "    def record_audio(self, path):\n",
    "        with wave.open(path, \"wb\") as wf:\n",
    "            wf.setnchannels(CHANNELS)\n",
    "            wf.setsampwidth(self.p.get_sample_size(FORMAT))\n",
    "            wf.setframerate(RATE)\n",
    "            stream = self.p.open(\n",
    "                format=FORMAT, channels=CHANNELS, rate=RATE, input=True\n",
    "            )\n",
    "            vad = webrtcvad.Vad(1)\n",
    "            not_speech_times = 0\n",
    "            speech_times = 0\n",
    "            total_times = 0\n",
    "            start_up_times = 33 * 5  # 初始时间设置为5秒\n",
    "            history_speech_times = 0\n",
    "            while True:\n",
    "                if history_speech_times > 33 * 10:\n",
    "                    break\n",
    "                data = stream.read(CHUNK, False)\n",
    "                if vad.is_speech(data, RATE):\n",
    "                    speech_times += 1\n",
    "                    wf.writeframes(data)\n",
    "                else:\n",
    "                    not_speech_times += 1\n",
    "                total_times += 1\n",
    "                if total_times >= start_up_times:\n",
    "                    history_speech_times += speech_times\n",
    "                    # 模拟滑窗重新开始计数\n",
    "                    if float(not_speech_times) / float(total_times) > 0.7:\n",
    "                        break\n",
    "                    not_speech_times = 0\n",
    "                    speech_times = 0\n",
    "                    total_times = 0\n",
    "                    start_up_times = start_up_times / 2\n",
    "                    if start_up_times < 33:\n",
    "                        start_up_times = 33\n",
    "            stream.close()\n",
    "            return history_speech_times * DURATION\n",
    "\n",
    "    def run_tts_and_play_audio(self, text: str):\n",
    "        # AppBuilder内置的TTS使用文档，用户可根据文档调整参数：https://github.com/baidubce/app-builder/tree/master/python/core/components/tts\n",
    "        msg = self.tts.run(\n",
    "            appbuilder.Message(content={\"text\": text}),\n",
    "            speed=5,\n",
    "            pitch=5,\n",
    "            volume=5,\n",
    "            person=0,\n",
    "            audio_type=\"pcm\",\n",
    "            model=\"paddlespeech-tts\",\n",
    "            stream=True,\n",
    "        )\n",
    "        stream = self.p.open(\n",
    "            format=self.p.get_format_from_width(2),\n",
    "            channels=1,\n",
    "            rate=24000,\n",
    "            output=True,\n",
    "            frames_per_buffer=2048,\n",
    "        )\n",
    "        for pcm in msg.content:\n",
    "            stream.write(pcm)\n",
    "        stream.stop_stream()\n",
    "        stream.close()\n",
    "\n",
    "    # AppBuilder内置的ASR使用文档，用户可根据文档调整参数：https://github.com/baidubce/app-builder/blob/master/python/core/components/asr/README.md\n",
    "    def run_asr(self, audio_path: str):\n",
    "        with open(audio_path, \"rb\") as f:\n",
    "            content_data = {\"audio_format\": \"wav\", \"raw_audio\": f.read(), \"rate\": 16000}\n",
    "            msg = appbuilder.Message(content_data)\n",
    "            out = self.asr.run(msg)\n",
    "            text = out.content[\"result\"][0]\n",
    "            return text\n",
    "\n",
    "    def run_agent(self, query):\n",
    "        msg = self.agent.run(self.conversation_id, query, stream=True)\n",
    "        answer = \"\"\n",
    "        for content in msg.content:\n",
    "            answer += content.answer\n",
    "        return answer\n",
    "\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    chatbot = Chatbot()\n",
    "    chatbot.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用方法\n",
    "\n",
    "直接运行程序即可。\n",
    "\n",
    "用户也可以将下面的功能模块替换成自己的其他实现或模型：\n",
    "* record_audio: 录音\n",
    "* run_asr: 语音识别语音识别，[AppBuilder ASR组件使用文档](https://github.com/baidubce/app-builder/blob/master/python/core/components/asr/README.md)\n",
    "* run_agent: Agent对话功能，[AppBuilder TTS组件使用文档](https://github.com/baidubce/app-builder/blob/master/python/core/components/tts/README.md)\n",
    "* run_tts_and_play_audio：回复的语音生成并播报\n",
    "\n",
    "**AppBuilder TTS组件参数**\n",
    "| 参数名称       | 参数类型    | 是否必须 | 描述                                                                                                                                                                                             | 示例值                                 |\n",
    "|------------|---------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|\n",
    "| message    | String  | 是    | 待转成语音的文本                                                                                                                                                                                       | Message(content={\"text\": \"需合成的文本\"}) |\n",
    "| model      | String  | 否    | 默认是`baidu-tts`模型，可选值：`paddlespeech-tts`、`baidu-tts`                                                                                                                                            | paddlespeech-tts                    |\n",
    "| speed      | Integer | 否    | 语音语速，默认是5中等语速，取值范围在0~15之间，仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效                                                                                                                  | 5                                   |\n",
    "| pitch      | Integer | 否    | 语音音调，默认是5中等音调，取值范围在0~15之间，仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效                                                                                                                  | 5                                   |\n",
    "| volume     | Integer | 否    | 语音音量，默认是5中等音量，取值范围在0~15之间，,仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效                                                                                                                 | 5                                   |\n",
    "| person     | Integer | 否    | 语音人物特征，默认是0(度小美),普通音库可选值包括: 0(度小美)、1(度小宇)、3(度逍遥-基础)、4(度丫丫)；精品音库包括：5003(度逍遥-精品)、5118(度小鹿)、106(度博文)、110(度小童)、111(度小萌)、103(度米朵)、5(度小娇)；臻品音库包括：4003(度逍遥-情感男声)、4106(度博文-专业男主播)、4115(度小贤-电台男主播)、4119(度小鹿-甜美女声)、4105(度灵儿-清激女声)、4117(度小乔-活泼女声)、4100(度小雯-活力女主播)、4103(度米朵-可爱女声)、4144(度姗姗-娱乐女声)、4278(度小贝-知识女主播)、4143(度清风-配音男声)、4140(度小新-专业女主播)、4129(度小彦-知识男主播)、4149(度星河-广告男声)、4254(度小清-广告女声)、4206(度博文-综艺男声)、4226(南方-电台女主播)。仅当模型为`baidu-tts`参数有效，如果模型为`paddlespeech-tts`，参数自动失效 | 0                                   |\n",
    "| audio_type | String  | 否    | 音频文件格式，如果使用`baidu-tts`模型可选`mp3`, `wav`; 如果使用`paddlespeech-tts`模型非流式返回，参数只能设为`wav`;如果使用`paddlespeech-tts`模型流式返回，参数只能设为`pcm`                                                                     | wav                                 |\n",
    "| stream     | Bool    | 否    | 默认是False, 目前`paddlespeech-tts`模型支持流式返回，`baidu-tts`模型不支持流式返回                                                                                                                                    | False                               |\n",
    "| retry      | Integer | 否    | HTTP重试次数                                                                                                                                                                                       | 3                                   |\n",
    "| timeout    | Integer | 否    | HTTP超时时间                                                                                                                                                                                       | 5                                   |"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
